Mid-term evaluation (voluntary, anonymous, ~ 10 min)
HW2 will be posted on Friday
HW1 and project description will be graded this weekend
Check if you forgot to merge from develop branch into master branch
git push them into your repository (even if you have emailed me, just in case…)
Lab keys are for you to check your results.
HW1 will be graded by two questions (selected non-randomly)
Dr. Hua Zhou’s slides
brolgar tutorial
rm(list = ls()) # clean-up workspace
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Big Sur 11.5.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
## [5] readr_2.0.1 tidyr_1.1.3 tibble_3.1.3 ggplot2_3.3.5
## [9] tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.1 xfun_0.25 bslib_0.2.5.1 haven_2.4.3
## [5] colorspace_2.0-2 vctrs_0.3.8 generics_0.1.0 htmltools_0.5.1.1
## [9] yaml_2.2.1 utf8_1.2.2 rlang_0.4.11 jquerylib_0.1.4
## [13] pillar_1.6.2 glue_1.4.2 withr_2.4.2 DBI_1.1.1
## [17] dbplyr_2.1.1 modelr_0.1.8 readxl_1.3.1 lifecycle_1.0.0
## [21] munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0 rvest_1.0.1
## [25] evaluate_0.14 knitr_1.33 tzdb_0.1.2 fansi_0.5.0
## [29] broom_0.7.9 Rcpp_1.0.7 backports_1.2.1 scales_1.1.1
## [33] jsonlite_1.7.2 fs_1.5.0 hms_1.1.0 digest_0.6.27
## [37] stringi_1.7.3 grid_4.1.1 cli_3.0.1 tools_4.1.1
## [41] magrittr_2.0.1 sass_0.4.0 crayon_1.4.1 pkgconfig_2.0.3
## [45] ellipsis_0.3.2 xml2_1.3.2 reprex_2.0.1 lubridate_1.7.10
## [49] rstudioapi_0.13 assertthat_0.2.1 rmarkdown_2.10 httr_1.4.2
## [53] R6_2.5.0 compiler_4.1.1
Recall the mpg data:
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # … with 224 more rowsBoxplots (grouped by class):
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot()
coord_cartesian() is the default cartesian coordinate system:
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_cartesian(xlim = c(0, 5))
coord_fixed() specifies aspect ratio (x / y):
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_fixed(ratio = 1/2)
coord_flip() flips x- and y- axis:
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_flip()
Pie chart:
bar <- ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, fill = cut),
show.legend = FALSE,
width = 1
) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
bar + coord_flip()
bar + coord_polar()
A map:
library("maps")
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
nz <- map_data("nz")
head(nz, 20)
## long lat group order region subregion
## 1 172.7433 -34.44215 1 1 North.Island <NA>
## 2 172.7983 -34.45562 1 2 North.Island <NA>
## 3 172.8528 -34.44846 1 3 North.Island <NA>
## 4 172.8986 -34.41786 1 4 North.Island <NA>
## 5 172.9593 -34.42503 1 5 North.Island <NA>
## 6 173.0184 -34.39895 1 6 North.Island <NA>
## 7 173.0229 -34.44662 1 7 North.Island <NA>
## 8 173.0184 -34.49343 1 8 North.Island <NA>
## 9 172.9616 -34.50426 1 9 North.Island <NA>
## 10 172.9181 -34.47367 1 10 North.Island <NA>
## 11 172.9353 -34.52225 1 11 North.Island <NA>
## 12 172.8808 -34.51504 1 12 North.Island <NA>
## 13 172.9049 -34.55646 1 13 North.Island <NA>
## 14 172.9553 -34.53303 1 14 North.Island <NA>
## 15 172.9376 -34.57806 1 15 North.Island <NA>
## 16 172.9760 -34.61227 1 16 North.Island <NA>
## 17 172.9926 -34.56723 1 17 North.Island <NA>
## 18 173.0218 -34.61404 1 18 North.Island <NA>
## 19 173.0396 -34.65902 1 19 North.Island <NA>
## 20 173.0676 -34.70044 1 20 North.Island <NA>ggplot(nz, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", colour = "black")
coord_quickmap() puts maps in scale:
ggplot(nz, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "black") +
coord_quickmap()
labs()
Figure title should be descriptive:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(title = "Fuel efficiency generally decreases with engine size")
subtitle adds additional detail in a smaller font beneath the title.
caption adds text at the bottom right of the plot, often used to describe the source of the data.
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data from fueleconomy.gov"
)
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
labs(
x = "Engine displacement (L)",
y = "Highway fuel economy (mpg)"
)
read about available options in ?plotmath
df <- tibble(x = runif(10), y = runif(10))
ggplot(df, aes(x, y)) + geom_point() +
labs(
x = quote(sum(x[i] ^ 2, i == 1, n)),
y = quote(alpha + beta + frac(delta, theta))
)
Find the most fuel efficient car in each car class:
best_in_class <- mpg %>%
group_by(class) %>%
filter(row_number(desc(hwy)) == 1)
# equivalent as
# best_in_class <- filter(group_by(mpg, class), row_number(desc(hwy)) == 1)
best_in_class
## # A tibble: 7 × 11
## # Groups: class [7]
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 chevrolet corvette 5.7 1999 8 manua… r 16 26 p 2seat…
## 2 dodge caravan … 2.4 1999 4 auto(… f 18 24 r miniv…
## 3 nissan altima 2.5 2008 4 manua… f 23 32 r midsi…
## 4 subaru forester… 2.5 2008 4 manua… 4 20 27 r suv
## 5 toyota toyota t… 2.7 2008 4 manua… 4 17 22 r pickup
## 6 volkswagen jetta 1.9 1999 4 manua… f 33 44 d compa…
## 7 volkswagen new beet… 1.9 1999 4 manua… f 35 44 d subco…dplyr::desc function transforms a vector into a format that will be sorted in descending order
dplyr::filter function subsets a data frame, retaining all rows that satisfy your conditions
| - Annotate points |
r ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(aes(colour = class)) + geom_text(aes(label = model), data = best_in_class) |
geom_label() draws a rectangle behind the textggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_label(aes(label = model), data = best_in_class, nudge_y = 2, alpha = 0.5)
ggrepel package automatically adjust labels so that they don’t overlap:
library("ggrepel")
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_point(size = 3, shape = 1, data = best_in_class) +
ggrepel::geom_label_repel(aes(label = model), data = best_in_class)
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))
automatically adds scales
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous() +
scale_y_continuous() +
scale_colour_discrete()
breaks
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_y_continuous(breaks = seq(15, 40, by = 5))
labels
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_x_continuous(labels = NULL) +
scale_y_continuous(labels = NULL)
Plot y-axis at log scale:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
scale_y_log10()
Plot x-axis in reverse order:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
scale_x_reverse()
ColorBrewer scales are documentd online at http://colorbrewer2.org/
Available via RColorBrewer package
#install.packages("wesanderson")
library(wesanderson)
for (name in names(wes_palettes)) {
print(wes_palette(name))
}
scale_colour_manual() to use predefined mapping between values and colorspresidential %>%
mutate(id = 33 + row_number()) %>%
ggplot(aes(start, id, colour = party)) +
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_colour_manual(values = c(Republican = "red", Democratic = "blue"))
use scale_colour_gradient() or scale_fill_gradient() for continuous colour
viridis::scale_colour_viridis()
df <- tibble(
x = rnorm(10000),
y = rnorm(10000)
)
ggplot(df, aes(x, y)) +
geom_hex() +
coord_fixed()
ggplot(df, aes(x, y)) +
geom_hex() +
viridis::scale_fill_viridis() +
coord_fixed()
All color scales come in two variety:
scale_colour_x() for colour aesthetics
scale_fill_x() for fill aesthetics
Set legend position: "left", "right", "top", "bottom", none:
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
theme(legend.position = "left")
See following link for more details on how to change title, labels, … of a legend.
Without clipping (removes unseen data points)
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
coord_cartesian(xlim = c(5, 7), ylim = c(10, 30))
With clipping (removes unseen data points)
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
xlim(5, 7) + ylim(10, 30)
same as
mpg %>%
filter(displ >= 5, displ <= 7, hwy >= 10, hwy <= 30) %>%
ggplot(aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth()
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
scale_x_continuous(limits = c(5, 7)) +
scale_y_continuous(limits = c(10, 30))
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
theme_bw()
ggplot(mpg, aes(displ, hwy)) + geom_point()
ggsave("my-plot.pdf")
## Saving 7 x 5 in image
RStudio cheat sheet is extremely helpful.